explorer: heatmap SQL pre-aggregation + adaptive radius (#233) by rdhyee · Pull Request #241 · isamplesorg/isamplesorg.github.io

rdhyee · 2026-05-28T01:25:47Z

Summary

Follow-up to PR #240 (heatmap phase 1). Two related changes:

SQL pre-aggregation — replaces the LIMIT 100000 raw-row scan + JS per-pixel binning with a DuckDB GROUP BY that does the binning server-side. Removes the cap honestly: every sample in the bbox is counted, regardless of total sample count.
Adaptive radius + maxOpacity — fixes the "everything red" symptom RY surfaced at world view after explorer: heatmap overlay spike — phase 1 (#233) #240 shipped.

Why #1: LIMIT 100000 was geographically biased

LIMIT 100000 returned the first 100k rows in parquet storage order — not random, not geographic. At world view, the heatmap silently showed whichever source happened to be physically first in the file (likely SESAR — the largest source by row count). The "(capped)" status warning from #240 disclosed the problem but didn't fix it.

This PR pushes the binning into DuckDB. SQL computes (x_bin, y_bin) pixel coordinates server-side using FLOOR/LEAST/GREATEST, then GROUP BY (x_bin, y_bin) returning one row per non-empty pixel with COUNT(*) as the count. Result cardinality is bounded by canvas pixels (≤ 512² = 262k), independent of bbox sample count. No LIMIT needed — every sample counted into its true pixel bucket.

Antimeridian handled: when bbox wraps (west > east), SQL shifts longitude < west by +360 so pixel arithmetic works in a continuous coordinate space.

Verified counts vs the existing samples table summary line (= true sample count for the current view):

view	heatmap	table	match
PKAP (100km alt)	77,840	77,840	✅
Cyprus medium (500km)	100,970	100,970	✅ (was capped at 100k)
Cyprus regional (1,500km)	682,029	682,029	✅ (was capped at 100k)
Africa (1.9Mkm)	12,875	12,875	✅
World view (15Mkm)	5,980,282	5,980,282	✅ (was capped at 100k)

Render time at world view (~6M samples → 35k cells): ~7s on localhost — similar to or faster than the LIMIT 100k version. Status text always reports the true count; the (capped) branch is removed.

Why #2: adaptive radius

After (1), RY tested staging and reported world view "everything is red." Cause: with 35k+ pixel cells on a 512² canvas, heatmap.js's default 25-pixel blur radius made each cell's Gaussian cover ~1% of canvas. 35k × 1% = >>100% → linear-additive blending saturated everything to full red.

Two complementary fixes:

maxOpacity: 0.6 on the heatmap.js instance — caps the rendered alpha so dense areas don't fully wash out the satellite imagery
Per-point radius computed from sqrt(canvas_pixels / cell_count) × 2, clamped to [6, 30]. World view (35k cells) → radius ≈ 6 (tight pixel dots, no overlap saturation). Cyprus medium (~400 cells) → radius = 30 cap (smooth blobs as before).

World view now shows geographic structure instead of solid red. Tight zooms unchanged visually.

Test plan

tests/playwright/heatmap-overlay.spec.js 5/5 pass on localhost
Visual verified on rdhyee staging at the URLs RY surfaced (Africa-wide alt=7.4Mkm, Atlantic alt=15Mkm). World view shows structure; smaller zooms unchanged.
Numbers match table exactly at all zoom levels (table above)
Verify on production after merge

Out of scope

Cluster dots still don't align with heatmap hotspots at cluster-mode altitudes (cluster = H3 centroids; heatmap = real positions). Phase 3 (third-mode promotion that hides cluster dots when heatmap is on) — separate work.
The HEATMAP_LIMIT constant (= 100,000) is kept in the code but no longer referenced; left in place for phase 2 in case a safety cap on cell count is reintroduced.

Provenance

Authored by Claude in response to RY feedback ("wondering whether we can do better geographic random sampling"). Approach (SQL pre-aggregation by pixel cell) chosen over TABLESAMPLE because it removes the cap entirely rather than just making the sampling random. Adaptive radius added in response to RY's second feedback that world view was washing out to red.

Cross-refs

explorer: spike a progressive heatmap layer as filter-honest alternative to cluster mode #233 — progressive heatmap spike (this is phase 1.5; phase 2 = progressive refinement)
explorer: heatmap overlay spike — phase 1 (#233) #240 — heatmap phase 1 (predecessor; introduced the LIMIT cap that this PR removes)
explorer: architectural direction — make filter semantics coherent across all surfaces #234 — explorer-filter-coherence roadmap (heatmap is the C-side solution)

) Two related changes that follow up PR isamplesorg#240 (heatmap phase 1): 1. SQL pre-aggregation removes the LIMIT 100000 cap honestly. 2. Adaptive per-point radius + maxOpacity caps avoid blur-overlap saturation at high-cell-count views (world view "everything red" symptom RY surfaced after isamplesorg#240 shipped). ## (1) SQL pre-aggregation Previously: `SELECT latitude, longitude FROM lite WHERE bbox AND filters LIMIT 100000`, then bin per pixel in JS. Two problems: - LIMIT 100000 returned the first 100k rows in parquet storage order — NOT random, NOT geographic. At world view, the heatmap silently showed whichever source happened to be physically first in the file (likely SESAR, the largest by row count). The "(capped)" status warning disclosed the problem but didn't fix it. - For sample sets above the cap, the density was unfaithful. Now: SQL computes pixel cell coords server-side using FLOOR / LEAST / GREATEST, then GROUP BY (x, y) returning one row per non-empty pixel with COUNT(*) as the count. Result cardinality is bounded by canvas pixels (≤ 512² = 262k), independent of how many samples the bbox contains. No LIMIT needed — every sample counted into its true pixel bucket. Antimeridian handled: when bbox wraps (west > east), SQL shifts longitudes < west by +360 so pixel arithmetic works in a continuous coordinate space. Verified counts vs `samples table` summary line (= true sample count for the current view): view | heatmap | table | match ------------------|----------|----------|------ PKAP (100km) | 77,840 | 77,840 | ✅ Cyprus medium | 100,970 | 100,970 | ✅ (was capped at 100k) Cyprus regional | 682,029 | 682,029 | ✅ (was capped at 100k) Africa (1.9Mkm) | 12,875 | 12,875 | ✅ World view | 5.98M | 5.98M | ✅ (was capped at 100k) Render time at world view (~6M samples → 35k cells): ~7s on localhost, similar to or faster than the LIMIT 100k version. `HEATMAP_LIMIT` constant left in place but no longer used (kept for back-compat in case phase 2 reintroduces a safety cell-count cap). ## (2) Adaptive radius + maxOpacity After (1), RY tested staging and reported world view "everything is red." Cause: with 35k+ pixel cells on a 512² canvas, heatmap.js's default 25-pixel blur radius made each cell's Gaussian blur cover ~1% of canvas. 35k × 1% = >>100% → linear-additive blending saturated everything. Two complementary fixes: - `maxOpacity: 0.6` on the heatmap.js instance config. Caps the rendered alpha so dense areas don't fully wash out the satellite imagery underneath. - Per-point radius computed from `sqrt(canvas_pixels / cell_count) * 2`, clamped to [6, 30]. World view (35k cells) → radius ≈ 6px (tight pixel dots, no overlap). Cyprus medium (~400 cells) → radius = 30px (cap, smooth blobs as before). Together: world view shows geographic structure instead of solid red. Tight zooms unchanged visually. ## Test plan - `tests/playwright/heatmap-overlay.spec.js` 5/5 still pass on localhost. - Visual verified on rdhyee staging at the URLs RY surfaced (Africa-wide, Atlantic alt=15Mkm). World view now shows structure; tight zooms unchanged. ## Provenance Authored by Claude, prompted by RY ("wondering whether we can do better geographic random sampling"). Approach (Option C from Claude's menu: SQL pre-aggregation by pixel cell) recommended over TABLESAMPLE because it removes the cap entirely rather than just making the sampling random. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rdhyee force-pushed the feat/heatmap-sql-aggregation branch from b631b4d to fb85ff0 Compare May 28, 2026 01:30

rdhyee force-pushed the feat/heatmap-sql-aggregation branch from fb85ff0 to 4a74b8f Compare May 28, 2026 01:34

rdhyee merged commit f9535ee into isamplesorg:main May 28, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

explorer: heatmap SQL pre-aggregation + adaptive radius (#233)#241

explorer: heatmap SQL pre-aggregation + adaptive radius (#233)#241
rdhyee merged 1 commit into
isamplesorg:mainfrom
rdhyee:feat/heatmap-sql-aggregation

rdhyee commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rdhyee commented May 28, 2026

Summary

Why #1: LIMIT 100000 was geographically biased

Why #2: adaptive radius

Test plan

Out of scope

Provenance

Cross-refs

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant